$ $With recent advances in CNNs, exceptional improvements have been made in semantic segmentation of high resolution images in terms of accuracy and latency. However, challenges still remain in detecting objects in crowded scenes, large scale variations, partial occlusion, and distortions, while still maintaining mobility and latency. We introduce a fast and efficient convolutional neural network, ASBU-Net, for semantic segmentation of high resolution images that addresses these problems and uses no novelty layers for ease of quantization and embedded hardware support. ASBU-Net is based on a new feature extraction module, atrous space bender layer (ASBL), which is efficient in terms of computation and memory. The ASB layers form a building block that is used to make ASBNet. Since this network does not use any special layers it can be easily implemented, quantized and deployed on FPGAs and other hardware with limited memory. We present experiments on resource and accuracy trade-offs and show strong performance compared to other popular models.
translated by 谷歌翻译
Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation, but the quality bar for medical and clinical applications is high. Today, attempts to assess models' clinical knowledge typically rely on automated evaluations on limited benchmarks. There is no standard to evaluate model predictions and reasoning across a breadth of tasks. To address this, we present MultiMedQA, a benchmark combining six existing open question answering datasets spanning professional medical exams, research, and consumer queries; and HealthSearchQA, a new free-response dataset of medical questions searched online. We propose a framework for human evaluation of model answers along multiple axes including factuality, precision, possible harm, and bias. In addition, we evaluate PaLM (a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM, on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA, MedMCQA, PubMedQA, MMLU clinical topics), including 67.6% accuracy on MedQA (US Medical License Exam questions), surpassing prior state-of-the-art by over 17%. However, human evaluation reveals key gaps in Flan-PaLM responses. To resolve this we introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians. We show that comprehension, recall of knowledge, and medical reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal important limitations of today's models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLM models for clinical applications.
translated by 谷歌翻译
Experience management is an emerging business area where organizations focus on understanding the feedback of customers and employees in order to improve their end-to-end experiences. This results in a unique set of machine learning problems to help understand how people feel, discover issues they care about, and find which actions need to be taken on data that are different in content and distribution from traditional NLP domains. In this paper, we present a case study of building text analysis applications that perform multiple classification tasks efficiently in 12 languages in the nascent business area of experience management. In order to scale up modern ML methods on experience data, we leverage cross lingual and multi-task modeling techniques to consolidate our models into a single deployment to avoid overhead. We also make use of model compression and model distillation to reduce overall inference latency and hardware cost to the level acceptable for business needs while maintaining model prediction quality. Our findings show that multi-task modeling improves task performance for a subset of experience management tasks in both XLM-R and mBert architectures. Among the compressed architectures we explored, we found that MiniLM achieved the best compression/performance tradeoff. Our case study demonstrates a speedup of up to 15.61x with 2.60% average task degradation (or 3.29x speedup with 1.71% degradation) and estimated savings of 44% over using the original full-size model. These results demonstrate a successful scaling up of text classification for the challenging new area of ML for experience management.
translated by 谷歌翻译
Federated learning (FL) enables the building of robust and generalizable AI models by leveraging diverse datasets from multiple collaborators without centralizing the data. We created NVIDIA FLARE as an open-source software development kit (SDK) to make it easier for data scientists to use FL in their research and real-world applications. The SDK includes solutions for state-of-the-art FL algorithms and federated machine learning approaches, which facilitate building workflows for distributed learning across enterprises and enable platform developers to create a secure, privacy-preserving offering for multiparty collaboration utilizing homomorphic encryption or differential privacy. The SDK is a lightweight, flexible, and scalable Python package, and allows researchers to bring their data science workflows implemented in any training libraries (PyTorch, TensorFlow, XGBoost, or even NumPy) and apply them in real-world FL settings. This paper introduces the key design principles of FLARE and illustrates some use cases (e.g., COVID analysis) with customizable FL workflows that implement different privacy-preserving algorithms. Code is available at https://github.com/NVIDIA/NVFlare.
translated by 谷歌翻译
已知神经模型被过度参数化,最近的工作表明,稀疏的文本到语音(TTS)模型可以超过密集的模型。尽管已经为其他域提出了大量稀疏方法,但这种方法很少在TTS中应用。在这项工作中,我们试图回答以下问题:所选稀疏技术在性能和模型复杂性上的特征是什么?我们比较了Tacotron2基线和应用五种技术的结果。然后,我们通过自然性,清晰度和韵律来评估表现,同时报告模型规模和训练时间。与先前的研究相辅相成,我们发现在训练之前或期间进行修剪可以实现与训练后的修剪相似的性能,并且可以更快地进行培训,同时除去整个神经元降低了性能远不止于删除参数。据我们所知,这是比较文本到语音综合中稀疏范式的第一部作品。
translated by 谷歌翻译
全波形反演(FWI)通常代表成像地下结构和物理参数的最新方法,但是,其实施通常面临着巨大的挑战,例如建立一个良好的初始模型以逃脱本地的最小值,并评估评估反转结果的不确定性。在本文中,我们建议使用连续和隐式定义的深神经表示形式提出隐式全波形反演(IFWI)算法。与对初始模型敏感的FWI相比,IFWI从增加的自由度中受益于深度学习优化,从而可以从随机初始化开始,从而大大降低了非唯一性的风险,并被当地的微型捕获。理论分析和实验分析都表明,在随机初始模型的情况下,IFWI能够收敛到全局最小值并产生具有精细结构的地下的高分辨率图像。此外,通过使用各种深度学习方法近似贝叶斯推断,可以轻松地对IFWI进行不确定性分析,这在本文中通过添加辍学神经元进行了分析。此外,IFWI具有一定程度的鲁棒性和强大的概括能力,在各种2D地质模型的实验中被例证。通过适当的设置,IFWI也可以非常适合多规模关节地球物理反演。
translated by 谷歌翻译
本文研究了“探索性”机器学习分类问题的置信后的事后校准。这些问题的困难源于持续的愿望,即在策划数据集时具有足够的例子来推广哪些类别的界限以及对这些类别的有效性的混乱。我们认为,对于此类问题,必须使用“单一的所有”方法(顶级标签校准),而不是文献中其他地方提倡的“校准 - 满足 - 响应 - 摩托克质”方法。我们介绍并测试了四种旨在处理特定置信度估计的特质的新算法。这些方法中的主要主要是将内核密度比用于置信度校准,包括用于选择带宽的新颖的防弹算法。我们测试了我们的主张,并探讨了生物信息学应用程序(Phanns)1以及经典的MNIST基准2。最后,我们的分析认为,事后校准应始终执行,应仅基于测试数据集,并且应在视觉上进行理智检查。
translated by 谷歌翻译
生物标志物确定患者对治疗的反应。随着基于变压器网络的人工智能的最新进展,仅进行了有限的研究来衡量具有挑战性的组织病理学图像的性能。在本文中,我们研究了众多最先进的变压器网络对免疫组织蛋白质细胞分割免疫组织癌(IHC)幻灯片中结肠癌的众多最先进的变压器网络的疗效。广泛而全面的实验结果证实,与其余评估的变压器和有效的U-NET方法相比,Missformer的骰子得分最高74.85%。
translated by 谷歌翻译
随着深度神经网络(DNN)已变得越来越普遍的工作量,可用于帮助其发展和部署的图书馆和工具范围已大大增长。可扩展的生产质量工具可在允许的许可下免费获得,并且可以访问足够多,甚至可以使小型团队变得非常有生产力。但是,在研究界,该工具的意识和使用不一定是广泛的,研究人员可能会因利用最新工具和工作流而缺少潜在的生产力提高。本文介绍了一个案例研究,我们讨论了我们最近生成端到端人工智能检测应用程序的经验。我们详细介绍了我们利用的高级深度学习库,容器化工作流,连续集成/部署管道以及开源代码模板,以产生竞争结果,与三个目标数据集的其他排名解决方案的性能匹配。我们强调了利用此类系统甚至可以为研究带来的价值,并详细介绍我们的解决方案,并在服务器类GPU上的准确性和推理时间以及服务器类CPU上的推理时间以及A的推理时间以及A覆盆子Pi 4。
translated by 谷歌翻译
张量程序的自动安排是一个过程,搜索算法自动探索了目标硬件平台上给定程序的候选时间表(程序转换)以提高其性能。但是,这可能是一个非常耗时的过程,具体取决于张量程序的复杂性和目标设备的容量,通常会探索成千上万的程序变体。为了解决这个问题,在本文中,我们介绍了转移调整的想法,一种新颖的方法来识别和重用张量程序之间的自动安排。我们使用深度神经网络(DNN)演示了这一概念,从预先调整的DNN中采取了一组自动安排,并使用它们来减少新DNN的推理时间。我们将转移调整与最先进的ANSOR自动安装程序进行了比较,将给定DNN模型的最大速度定义为Ansor使用其建议的完整调整时间来实现的目标。在服务器级CPU上以及在11种广泛使用的DNN型号上,我们观察到,转移调整可达到$ 88.41 \%$($ 49.13 \%\%\%$)的最大速度,而ANSOR则需要$ 6.5 \ tims $ $ $ $ $ $平均与之匹配。我们还评估了在受约束的边缘CPU上进行转移调节,并观察到搜索时间的差异会加剧,Ansor需要$ 10.8 \ times $ $ $ $ $ $,以匹配转移调整的速度,这进一步证明了其价值。我们的代码可从https://www.github.com/giclab/transfer-tuning获得
translated by 谷歌翻译